[Ministral 3] Add ministral 3#42498
Conversation
|
|
||
|
|
||
| @require_torch | ||
| class Ministral3ModelTest(CausalLMModelTest, unittest.TestCase): |
There was a problem hiding this comment.
I think this uses the wrong tester class
|
|
||
|
|
||
| @require_torch | ||
| class Ministral3IntegrationTest(unittest.TestCase): |
There was a problem hiding this comment.
integration tests are verified to work on H100
| ## Usage examples | ||
|
|
||
| ```py | ||
| import torch |
There was a problem hiding this comment.
It's verified that this example works
|
Ministral when? |
| - System Prompt: Maintains strong adherence and support for system prompts. | ||
| - Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting. | ||
| - Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere. | ||
| - Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes. |
| [mistralai/Ministral-3-8B-Base-2512](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512) | ||
| [mistralai/Ministral-3-8B-Instruct-2512](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) | ||
| [mistralai/Ministral-3-8B-Reasoning-2512](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512) |
juliendenize
left a comment
There was a problem hiding this comment.
Left some fixes for the code snippets
There was a problem hiding this comment.
Out of interest: if the only difference here is that the attn layer now supports L4-style rope extension, why was a whole new arch made instead of extending the regular Mistral LM arch with L4 rope support?
There was a problem hiding this comment.
Yeah we tried to sneak in Ministral with just a few lines of changes here: #42045 (review)
But transformers design philosophy is to make a new model for this as you can see from comment above 👍
There was a problem hiding this comment.
Transformers maintainers can probs give a better answer to this than me though
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…rmers into add_ministral_3
…/transformers into add_ministral_3
| else: | ||
| raise ValueError(f"Unknown config type {type(config)}.") | ||
|
|
||
| # let's swap nn.Linear to FP8 Linear before loading |
There was a problem hiding this comment.
does that make sense?
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, ministral3 |
* Up * WIP * WIP * WIP * Apply suggestions from code review Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> * Update src/transformers/models/ministral3/configuration_ministral3.py Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> * fix most tests * update docsting * fixup * typo in the ocnfig * make the last 3 tests pass * fix auto * nits * WIP * WIP * WIP * per tensor * WIP * WIP * WIP * style * fixup * WIP * WIP * WIP * hack for now * add todo * fixup * WIP --------- Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Co-authored-by: Arthur <arthur.zucker@gmail.com> Co-authored-by: medmekk <mekk.cyber@gmail.com>
* Up * WIP * WIP * WIP * Apply suggestions from code review Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> * Update src/transformers/models/ministral3/configuration_ministral3.py Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> * fix most tests * update docsting * fixup * typo in the ocnfig * make the last 3 tests pass * fix auto * nits * WIP * WIP * WIP * per tensor * WIP * WIP * WIP * style * fixup * WIP * WIP * WIP * hack for now * add todo * fixup * WIP --------- Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Co-authored-by: Arthur <arthur.zucker@gmail.com> Co-authored-by: medmekk <mekk.cyber@gmail.com>
This PR adds ministral 3